首页> 外文OA文献 >An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
【2h】

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

机译:基于图像序列的端到端可训练神经网络   识别及其在场景文本识别中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Image-based sequence recognition has been a long-standing research topic incomputer vision. In this paper, we investigate the problem of scene textrecognition, which is among the most important and challenging tasks inimage-based sequence recognition. A novel neural network architecture, whichintegrates feature extraction, sequence modeling and transcription into aunified framework, is proposed. Compared with previous systems for scene textrecognition, the proposed architecture possesses four distinctive properties:(1) It is end-to-end trainable, in contrast to most of the existing algorithmswhose components are separately trained and tuned. (2) It naturally handlessequences in arbitrary lengths, involving no character segmentation orhorizontal scale normalization. (3) It is not confined to any predefinedlexicon and achieves remarkable performances in both lexicon-free andlexicon-based scene text recognition tasks. (4) It generates an effective yetmuch smaller model, which is more practical for real-world applicationscenarios. The experiments on standard benchmarks, including the IIIT-5K,Street View Text and ICDAR datasets, demonstrate the superiority of theproposed algorithm over the prior arts. Moreover, the proposed algorithmperforms well in the task of image-based music score recognition, whichevidently verifies the generality of it.
机译:基于图像的序列识别已成为计算机视觉领域的长期研究课题。在本文中,我们研究了场景文本识别的问题,这是基于图像的序列识别中最重要和最具挑战性的任务之一。提出了一种将特征提取,序列建模和转录集成为统一框架的新型神经网络体系结构。与以前的场景文本识别系统相比,该体系结构具有四个独特的特性:(1)它是端到端可训练的,与大多数现有算法相比,它们的各个组件都是单独训练和调整的。 (2)它自然地具有任意长度的不等式,不涉及字符分割或水平尺度归一化。 (3)它不限于任何预定义的词典,并且在无词典和基于词典的场景文本识别任务中均表现出色。 (4)它生成了一个有效的但更小的模型,对于实际应用场景来说更实用。在包括IIIT-5K,街景文字和ICDAR数据集在内的标准基准上进行的实验证明了该算法优于现有技术的优越性。此外,该算法在基于图像的乐谱识别任务中表现良好,证明了其通用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号